This notebook creates a graph representation of the collaboration between contributors of a Git repository, where nodes are authors, and edges are weighted by the parent/child dependencies between the commits of authors.


In [13]:
%matplotlib inline
from bigbang.git_repo import GitRepo;
from bigbang import repo_loader;

import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd

In [14]:
repos = repo_loader.get_org_repos("codeforamerica")
repo = repo_loader.get_multi_repo(repos=repos)
full_info = repo.commit_data;


Checking if cached
Running Entity Resolution on cfahelloworld
Checking if cached
Running Entity Resolution on shortstack
Checking if cached
Running Entity Resolution on cfawp2012
Checking if cached
Running Entity Resolution on Open311-Visualization
Checking if cached
Running Entity Resolution on open311
Checking if cached
Running Entity Resolution on follow-all
Checking if cached
Running Entity Resolution on gollum
Checking if cached
Running Entity Resolution on svg_canvas_experiments
Checking if cached
Running Entity Resolution on adopt-a-hydrant
Checking if cached
Running Entity Resolution on secretsauce
Checking if cached
Running Entity Resolution on datalogue
Checking if cached
Running Entity Resolution on open311dashboard
Checking if cached
Running Entity Resolution on designforamerica
Checking if cached
Running Entity Resolution on tipster
Checking if cached
Running Entity Resolution on georuby
Checking if cached
Running Entity Resolution on gem_template
Checking if cached
Running Entity Resolution on hubbuds
Checking if cached
Running Entity Resolution on Twitter-Collage
Checking if cached
Running Entity Resolution on flocky
Checking if cached
Running Entity Resolution on councilmatic
Checking if cached
Running Entity Resolution on classtalk
Checking if cached
Running Entity Resolution on 2010BasicCensusMap
Checking if cached
Running Entity Resolution on cfa-drupal-template
Checking if cached
Running Entity Resolution on wheresmyschoolbus
Checking if cached
Running Entity Resolution on cfa-drupal-example-module
Checking if cached
Running Entity Resolution on Wufoopress
Checking if cached
Running Entity Resolution on cfa_coder_sounds
Checking if cached
Running Entity Resolution on engagement_toolkit
Checking if cached
Running Entity Resolution on Catalyze
Checking if cached
Running Entity Resolution on lunch_roulette

Nodes will be Author objects, each of which holds a list of Commit objects.


In [15]:
class Commit:
    def __init__(self, message, hexsha, parents):
        self.message = message
        self.hexsha = hexsha
        self.parents = parents
        
    def __repr__(self):
        return ' '.join(self.message.split(' ')[:4])

    
class Author:
    def __init__(self, name, commits):
        self.name = name
        self.commits = commits
        self.number_of_commits = 1
    
    def add_commit(self, commit):
        self.commits.append(commit)
        self.number_of_commits += 1
        
    def __repr__(self):
        return self.name

We create a list of authors, also separately keeping track of committer names to make sure we only add each author once. If a commit by an already stored author is found, we add it to that authors list of commits.


In [16]:
def get_authors():
    authors = []
    names = []

    for index, row in full_info.iterrows():
        name = row["Committer Name"]
        hexsha = row["HEXSHA"]
        parents = row["Parent Commit"]
        message = row["Commit Message"]

        if name not in names:
            authors.append(Author(name, [Commit(message, hexsha, parents)]))
            names.append(name)

        else:
            for author in authors:
                if author.name == name:
                    author.add_commit(Commit(message, hexsha, parents))

    return authors

We create our graph by forming an edge whenever an author has a commit which is the parent of another author's commit, and only increasing the weight of that edge if an edge between those two authors already exists.


In [17]:
def make_graph(nodes):
    G = nx.Graph()
    
    for author in nodes:
        for commit in author.commits:
            for other in nodes:
                for other_commit in other.commits:
                    if commit.hexsha in other_commit.parents:
                        if G.has_edge(author, other):
                            G[author][other]['weight'] += 1
                        else:
                            G.add_edge(author, other, weight = 1)
    
    return G

In [ ]:
nodes = get_authors()
G = make_graph(nodes)

pos = nx.spring_layout(G, iterations=100)
nx.draw(G, pos, font_size=8, with_labels = False)
# nx.draw_networkx_labels(G, pos);